130 research outputs found

    Improving data identification and tagging for more effective decision making in agriculture

    Get PDF
    International audienc

    Building a biomedical ontology recommender web service

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use.</p> <p>Methods</p> <p>We present the <it>Biomedical Ontology Recommender web service</it>. The system uses textual metadata or a set of keywords describing a domain of interest and suggests appropriate ontologies for annotating or representing the data. The service makes a decision based on three criteria. The first one is <it>coverage</it>, or the ontologies that provide most terms covering the input text. The second is <it>connectivity</it>, or the ontologies that are most often mapped to by other ontologies. The final criterion is <it>size</it>, or the number of concepts in the ontologies. The service scores the ontologies as a function of scores of the annotations created using the National Center for Biomedical Ontology (NCBO) <it>Annotator web service</it>. We used all the ontologies from the UMLS Metathesaurus and the NCBO BioPortal.</p> <p>Results</p> <p>We compare and contrast our Recommender by an exhaustive functional comparison to previously published efforts. We evaluate and discuss the results of several recommendation heuristics in the context of three real world use cases. The best recommendations heuristics, rated ‘very relevant’ by expert evaluators, are the ones based on coverage and connectivity criteria. The Recommender service (alpha version) is available to the community and is embedded into BioPortal.</p

    Building an effective and efficient background knowledge resource to enhance ontology matching

    Get PDF
    International audienceOntology matching is critical for data integration and interoperability. Original ontology matching approaches relied solely on the content of the ontologies to align. However, these approaches are less effective when equivalent concepts have dissimilar labels and are structured with different modeling views. To overcome this semantic heterogeneity, the community has turned to the use of external background knowledge resources. Several methods have been proposed to select ontologies, other than the ones to align, as background knowledge to enhance a given ontology-matching task. However, these methods return a set of complete ontologies, while, in most cases, only fragments of the returned ontologies are effective for discovering new mappings. In this article, we propose an approach to select and build a background knowledge resource with just the right concepts chosen from a set of ontologies, which improves efficiency without loss of effectiveness. The use of background knowledge in ontology matching is a double-edged sword: while it may increase recall (i.e., retrieve more correct mappings), it may lower precision (i.e., produce more incorrect mappings). Therefore, we propose two methods to select the most relevant mappings from the candidate ones: (1)~a selection based on a set of rules and (2)~a selection based on supervised machine learning. Our experiments, conducted on two Ontology Alignment Evaluation Initiative (OAEI) datasets, confirm the effectiveness and efficiency of our approach. Moreover, the F-measure values obtained with our approach are very competitive to those of the state-of-the-art matchers exploiting background knowledge resources

    NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

    Get PDF
    Biomedical researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies. It also can be customized to fit the needs of different scenarios. Ontology Recommender 2.0 combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available.Comment: 29 pages, 8 figures, 11 table

    Preference Dissemination by Sharing Viewpoints: Simulating Serendipity

    Get PDF
    IC3K 2015 will be held in conjunction with IJCCI 2015International audienceThe Web currently stores two types of content. These contents include linked data from the semantic Web and user contributions from the social Web. Our aim is to represent simplified aspects of these contents within a unified topological model and to harvest the benefits of integrating both content types in order to prompt collective learning and knowledge discovery. In particular, we wish to capture the phenomenon of Serendipity (i.e., incidental learning) using a subjective knowledge representation formalism, in which several " viewpoints " are individually interpretable from a knowledge graph. We prove our own Viewpoints approach by evidencing the collective learning capacity enabled by our approach. To that effect, we build a simulation that disseminates knowledge with linked data and user contributions, similar to the way the Web is formed. Using a behavioral model configured to represent various Web navigation strategies, we seek to optimize the distribution of preference systems. Our results outline the most appropriate strategies for incidental learning, bringing us closer to understanding and modeling the processes involved in Serendipity. An implementation of the Viewpoints formalism kernel is available. The underlying Viewpoints model allows us to abstract and generalize our current proof of concept for the indexing of any type of data set

    Extraction automatique de termes combinant différentes informations

    Get PDF
    National audiencePour une communauté, la terminologie est essentielle car elle permet de décrire, échanger et récupérer les données. Dans de nombreux domaines, l'explosion du volume des données textuelles nécessite de recourir à une automatisation du processus d'extraction de la terminologie, voire son enrichissement. L'extraction automatique de termes peut s'appuyer sur des approches de traitement du langage naturel. Des méthodes prenant en compte les aspects linguistiques et statistiques proposées dans la littérature, résolvent quelques problèmes liés à l'extraction de termes tels que la faible fréquence, la complexité d'extraction de termes de plusieurs mots, ou l'effort humain pour valider les termes candidats. Dans ce contexte, nous proposons deux nouvelles mesures pour l'extraction et le "ranking" des termes formés de plusieurs mots à partir des corpus spécifiques d'un domaine. En outre, nous montrons comment l'utilisation du Web pour évaluer l'importance d'un terme candidat permet d'améliorer les résultats en terme de précision. Ces expérimentations sont réalisées sur le corpus biomédical GENIA en utilisant des mesures de la littérature telles que C-value

    Biomedical Terminology Extraction: A new combination of Statistical and Web Mining Approaches

    Get PDF
    International audienceThe objective of this work is to combine statistical and web mining methods for the automatic extraction, and ranking of biomedical terms from free text. We present new extraction methods that use linguistic patterns specialized for the biomedical field, and use term extraction measures, such as C-value, and keyword extraction measures, such as Okapi BM25, and TFIDF. We propose several combinations of these measures to improve the extraction and ranking process and we investigate which combinations are more relevant for different cases. Each measure gives us a ranked list of candidate terms that we finally re-rank with a new web-based measure. Our experiments show, first that an appropriate harmonic mean of C-value used with keyword extraction measures offers better precision results than used alone, either for the extraction of single-word and multi-words terms; second, that best precision results are often obtained when we re-rank using the web-based measure. We illustrate our results on the extraction of English and French biomedical terms from a corpus of laboratory tests available online in both languages. The results are validated by using UMLS (in English) and only MeSH (in French) as reference dictionary

    A Way to Automatically Enrich Biomedical Ontologies

    Get PDF
    International audienceBiomedical ontologies play an important role for information extraction in the biomedical domain. We present a workflow for updating automatically biomedical ontologies, composed of four steps. We detail two contributions concerning the concept extraction and semantic linkage of extracted terminology

    Biomedical term extraction: overview and a new methodology

    Get PDF
    International audienceTerminology extraction is an essential task in domain knowledge acquisition, as well as for Information Retrieval (IR). It is also a mandatory first step aimed at building/enriching terminologies and ontologies. As often proposed in the literature, existing terminology extraction methods feature linguistic and statistical aspects and solve some problems related (but not completely) to term extraction, e.g. noise, silence, low frequency, large-corpora, complexity of the multi-word term extraction process. In contrast, we propose a cutting edge methodology to extract and to rank biomedical terms, covering the all mentioned problems. This methodology offers several measures based on linguistic, statistical, graphic and web aspects. These measures extract and rank candidate terms with excellent precision: we demonstrate that they outperform previously reported precision results for automatic term extraction, and work with different languages (English, French, and Spanish). We also demonstrate how the use of graphs and the web to assess the significance of a term candidate, enables us to outperform precision results. We evaluated our methodology on the biomedical GENIA and LabTestsOnline corpora and compared it with previously reported measures
    • …
    corecore